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Abstract 

Summarizing Likert scale ratings from human annotators is an important step for 
collecting human judgments. In this project we study a novel, graph theoretic 
method for this purpose. We also analyze a few interesting properties for this 
approach using real annotation datasets. 


1 Introduction 

Likert scale is a popular method for quantifying and gathering human opinion. In disciplines like 
Behavioral Science, Psychology, or Human-Computer-Interaction, scientists use Likert scale to mea¬ 
sure subjective opinions from human annotators. As human annotations are inherently noisy, it is 
often customary to collect the data from more than one annotators (Lig. [^. Traditionally, average 
of these ratings are computed to get a summary. 



Annotators 


Ligure 1: Process of human annotation. 


However, human annotators usually have their own bias on the ratings. Lor example, some anno¬ 
tators are biased towards high ratings, some towards low ratings; some annotators rate on a wide 
range, some others rate on a narrow range. As a result, computing averages without addressing 
these variations might lead to erroneous results. 

In classical literature on label denoising, researchers attempt to learn the underlying distribution 
of annotator ratings to rescale and compute the average. On the other hand, in this project we 
are interested to capture the neighborhood information from the datapoints. The rationale for this 
approach is as follows: Although humans have personal bias in assigning the exact values in the 
Likert scale, the idea of relative positions of the datapoints is universal. We employ a graph based 
technique to capture this neighborhood relationship within the datapoints. 
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Data Points 


Figure 2: The structure of the Annotation Matrix 


2 Literature Review 


In recent years, acquiring ground truth via accumulation of several unreliable crowd annotators is 
considered to be an important problem to address. An increasing body of literature is addressing 
this problem from various perspectives. For example, there are works involving complicated gen¬ 
erative models iSl for denoising and aggregating crowd opinion. As these models involve latent 
variables, the solution approach typically involve Expectation Maximization (EM) (Tl framework. 
These approaches are often criticized O as EM is sensitive to initialization and can stick to local 
optima. Liu et al. approached this problem as an inference problem in probabilistic models. 
They used variational methods such as Belief Propagation and Mean Fields. 

However, none of these approaches consider the relative neighborhood of the vertices from graph 
perspective. As the annotators assign ratings to the subjects in comparison to one another; it should 
be reasonable to assume that the ground truth annotations captured in the relative distances among 
the datapoints. 


3 Problem Formulation 

Let us consider the annotation data is stored in an m x n matrix as shown in Fig.[^ Each row of the 
matrix represents a unique annotator and each column represents a unique datapoint. The ratings 
are given in a K-point Likert scale. The goal of this problem is to formulate a linear embedding that 
projects the m dimensional data points on to a one dimensional line while preserving the relative 
neighborhood. This idea is inspired from the concept of Locality Preserving Projection proposed by 
Niyogi et al. 0. If a be the intended embedding, then the projection of datapoint X.j is yj, which 
can be written as in Eq. 0 - 


Vi = 




( 1 ) 


In order to capture the neighborhood structure among the datapoints, we formulate a graph for each 
individual annotator. In the graph each node corresponds to a datapoint. In order to avoid encoding 
the subjective biases, we take the following simplest rule for forming the edges — two nodes will 
be connected by an edge if and only if the two datapoints receive identical nonzero rating. That is. 


(m) 

W-- = 


{o 


if node i and j has nonzero identical score by annotator m 
otherwise. 


( 2 ) 


This weight assignment captures the proximity of the datapoints in a higher dimensional space. The 
more annotators agree with this proximity information, the closer the datapoints are considered to 
be. To capture this structure, we formulate the final neighborhood graph by averaging the proximity 
weights between nodes i and j for all the annotators. Now we project the datapoints on a ID 
space so that the distances among the projections preserve the neighborhood structure composed by 



all the w's. Mathematically we want to minimize the following. 
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where, := is called the graph Laplacian for to* annotator graph. Dm and Am are 

the degree matrix and adjacency matrix respectively for the to* annotator graph. As the graphs are 
undirected, so A^a is symmetric. is obtained by performing row or column-wise sum and then 
placing the sums in the diagonal. As the degree of the nodes capture a natural measure of node 
importance, we impose a constraint a. = 1. Here I^avg is the average degree matrix for 

all the annotators. Therefore, the optimization problem becomes: 


argmin 

a 

S.t. 


a^XLavg^^a 
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( 4 ) 


4 Optimization 

We use Lagrange multiplier to construct the objective function as follows: 

£(a) = a^XLavg^^a + A(1 - a^XDavgX^a). (5) 

As both the Laplacian matrix and Degree matrix are positive semi-definite, E is a convex function 
of a. Therefore, differentiating C with respect to a and setting to zero we get, 

dC 

— = 2ALavg^^a - 2AADavgX^a = 0 ^ XL^^^X'^a = XXD^^^X'^a. (6) 

oa 

Therefore a can be obtaining by solving the generalized Eigenvalue problem as shown in Eq. ([^. 
In order to get the solution that minimizes £, we take the Eigenvector with smallest corresponding 
Eigenvalue. 

5 Algorithm 

Erom the discussions in Sectionand Section]^ we can formulate an algorithm for computing the 
intended ID embedding. We show it in Algorithm 

6 Data 

We apply the algorithm on the following two datasets: 

• Job interview dataset HU, and 

• Public speaking dataset ID 

We used the annotators’ response on the overall performance of the study participants. The data 
matrix, X, for both datasets are shown in Eig.[^ There are 4 annotators and 138 datapoints (partici¬ 
pants) in the job interview dataset. It is particularly evident from the picture of job interview dataset 
that the annotator 1 and 4 have a tendency to give higher rating than annotator 2 and 3. In the public 
speaking dataset, there are 15 annotators and 51 datapoints. However, there are a number of missing 
values (rating is zero) in the public speaking dataset. 
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Input: Annotation Matrix, X 
Output: Denoised Annotations, y 

Initialize:; 

foreach annotator m do 

Construct a Graph with Adjacency Matrix where each node represents a datapoint and 
edges satisfy Eq. 0; 

end 

A y _L A • 

^avg ^ jif Z^rn=l 

Construct a diagonal matrix, -D^yg, with entries X^rows ^avg; 

L := -Davg ^avg? 

Calculate Generalized Eigenvalue solution for Eq. ([^; 

Return normalized Eigenvector with smallest corresponding Eigenvalue; 

Algorithm 1: Algorithm for denoising the manual annotation using graph theory 



Datapoints 

Eigure 3: Job Interview Dataset (Left) and 


Datapoints 

Public speaking Dataset (Right) 


7 Results 


The results for the job interview dataset are shown in Eig. The topmost row shows the original 
data matrix (X) on the left and the average of annotators’ scores on the right. 

In the second row, we sort the datapoints in ascending order of the mean ratings for representa¬ 
tional convenience. The columns of the data matrix is also sorted accordingly. We also plot the 
denoised values of the ratings (i.e. the projected values obtained by the proposed algorithm) in red 
markers. It is interesting to notice that the projected values follow a flipped sequence than the mean 
ratings. This is due to the fact that our proposed optimization algorithm selects an embedding by 
only preserving the relative neighborhood among the datapoints. It does not preserve the absolute 
values. Consequently, the datapoints might arbitrarily take positive or negatively correlation with 
the mean values. However, in practice, it is not of a big concern as we can always flip the sequence 
by subtracting all the values in the sequence from the maximum allowable rating. 

The third row represents the data matrix and the accumulated plots which are sorted based on the 
denoised ratings. If we compare the scatter plots (right hand side plot) of the second row with 
the third one, it is evident that the denoised ratings give finer discrimination among the datapoints 
than the mean values. As there are only four annotators, the means cannot discriminate among the 

datapoints less than ^ of the ratings. As a result, significantly more datapoints receive same rating 
(notice the blue line on the right hand plot of the second row). However, as the denoised ratings 
consider the neighborhoods of the datapoints, it can discriminate with finer detail (notice the red 
line on the right hand plot of third row). 
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Figure 4: Results of denoising for Job Interview Dataset 
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Figure 5: Results of denoising for Public Speaking Dataset 


In the fourth row, the denoised ratings are flipped by subtracting them from the maximum rating, 7. 
This made the denoised ratings to be positively correlated with the mean values of the ratings. The 
datapoints were resorted in ascending order of the flipped denoised ratings. 

In Fig.|^ we show similar results for public speaking dataset. While calculating the mean, we totally 
omitted the missing values so that they do not bias the mean ratings. In this dataset, the mean values 
are capable to show flner discrimination than the job interview dataset. This is due to the higher 
number of annotators. An interesting phenomenon is that the denoised ratings particularly enhanced 
the poor quality of three datapoints (notice the three leftmost points in the red line on 2nd row) while 
the mean ratings “smooth out” the differences. 

8 Future Work and Conclusion 

In this project, we proposed a novel technique for summarizing the annotators opinion. The tech¬ 
nique employs graph structure which captures the relative neighborhood among the datapoints. We 
applied the techniques on two datasets and compared with simple mean ratings. 

In future, we will try to apply this technique with ground truth information of subjective data. Having 
the ground truth will enable us to better quantify the quality of this metric. 
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